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ABSTRACT 

Statistical  models  usually  involve  some  degree  of  approximation  and 
therefore  are  nearly  always  wrong.  Because  of  this  inexactness,  an  assessment 
of  the  influence  of  minor  perturbations  of  the  model  is  important.  We  discuss 
a  method  for  carrying  out  such  an  assessment.  The  method  Is  not  restricted  to 
a  particular  class  of  models,  and  it  seems  to  provide  a  relatively  simple, 
unified  approach  for  handling  a  variety  of  problems. 
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ASSESSMENT  OF  LOCAL  INFLUENCE 


F.  Dennis  Cook 
1 .  INTRODUCTION 

Statistical  models  are  extremely  useful  devices  for  extracting  and  under¬ 
standing  the  essential  features  of  a  set  of  data.  Models,  however,  are  nearly 
always  approximate  descriptions  of  more  complicated  processes  and  therefore 
are  nearly  always  wrong.  Because  of  this  inexactness,  the  study  of  the  varia¬ 
tion  in  the  results  of  an  analysis  under  modest  modifications  of  the  problem 
formulation  becomes  important.  If  a  minor  modification  of  an  approximate 
description  seriously  influences  the  key  results  of  an  analysis,  there  is 
surely  cause  for  concern.  On  the  other  hand,  if  such  modifications  are  found 
to  be  unimportant,  the  sample  is  robust  with  respect  to  the  induced 
perturbations  (Barnard  1980)  and  our  ignorance  of  the  precise  model  will  do  no 
harm. 

Although  an  assessment  of  the  influence  of  a  model  perturbation  is 
generally  considered  to  be  important,  few  general  methods  are  available  for 
carrying  out  such  an  assessment  in  contexts  other  than  normal  linear 
regression,  and  much  of  the  past  work  is  concerned  with  only  the  perturbation 
scheme  in  which  the  weights  attached  to  individual  or  groups  of  cases  are 
modified.  Cook  (1977,  1979)  and  Belsley,  Kuh  and  Welsch  (1980)  propose 
diagnostics  for  assessing  the  influence  of  case  weight  perturbations  in  linear 
regression.  For  the  most  part,  the  case  weights  are  restricted  to  be  either 
0  or  1  so  that  a  case  is  either  deleted  or  retained  at  full  weight.  These 
ideas  are  adapted  for  use  in  logistic  regression  by  Pregibon  (1981). 
Moolgavkar,  Lustbader  and  Venzon  (1984)  give  a  number  of  useful  results  on 


Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041 


cam  delation  diagnostics  for  general  exponential  families,  and  Lustbader  and 
Moolgavkar  (1984)  investigate  the  change  in  the  score  test  on  deletion  of 
casas.  Oman  (1984)  develops  measures  for  assessing  the  influence  of 
individual  cases  in  calibration  problems. 

Andrews  and  Pregibon  (1978),  Atkinson  (1982)  and  Johnson  and  Geisser 
(1982)  also  propose  diagnostics  based  on  case  deletion  schemes.  For  a  review 
of  theM  works  and  related  literature,  see  Cook  and  Weisberg  (1982). 

Attempts  to  provide  a  firm  foundation  for  diagnostics  based  on  case 
weight  perturbation  schemes  are  described  in  Cook  and  Weisberg  (1982)  and 
Welsch  (1982).  These  attempts  are  based  on  the  influence  curve,  a 
construction  that  relies  on  an  appropriate  functional  of  the  true  underlying 
distribution  function.  Die  influence  curve  has  been  of  value  in  the 
formulation  of  robust  estimators,  but  it  may  be  more  of  a  hindrance  than  a 
help  in  the  present  context.  To  employ  this  idea  for  the  construction  of  an 
influence  diagnostic  we  must  construct  the  influence  curve,  choose  one  of  the 
many  sample  versions  and  then  select  a  suitable  norm.  Even  in  normal  linear 
regression  this  process  seems  to  obscure  rather  than  illuminate  the  problem  at 
hand.  The  difficulty  involved  in  carrying  out  the  program  for  more 
complicated  settings  is  a  further  annoyance. 

This  paper  presents  a  general  method  for  assessing  the  local  influence  of 
minor  perturbations  of  a  statistical  model.  The  method  relies  on  a  well- 
defined  likelihood  and  certain  elementary  ideas  from  differential  geometry, 
and  Mems  to  provide  a  relatively  simple,  unified  approach  for  handling  a 
variety  of  problems.  Barnard  (1980)  gives  a  brief  general  discussion  on  using 
the  likelihood  to  assess  the  consequences  of  model  perturbations.  Although 
this  paper  is  concerned  primarily  with  local  influence,  some  discussion  of 
assessing  global  influence,  which  is  a  significantly  more  difficult  problem, 
will  be  given  also. 
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In  the  next  section,  we  introduce  the  idea  of  an  influence  graph,  a 
quantity  which  seems  fundamental  to  the  study  of  influence  as  described 
earlier  in  this  section.  In  section  3,  we  discuss  numerical  summaries  of 
influence  graphs.  Several  illustrations  are  given  in  section  4  and  section  5 
contains  concluding  comments. 


2.  INFLUENCE  GRAPHS 

2. 1  Motivation 

Consider  the  standard  linear  regression  model 

T  -  Z0  +  e  (1 ) 

where  the  elements  of  the  n  x  i  vector  e  are  assumed  to  be 

2 

independent  normal  random  variables  with  mean  zero  and  known  variance  a  .  To 
motivate  the  developownts  of  this  section,  we  use  model  (1)  and  the  following 
form  of  the  influence  statistic  proposed  by  Cook  (1977), 

Di  -  IT  -  T(i)l2/p02  (2) 

A  A 

where  T  and  ^  are  the  n  x  1  vectors  of  fitted  values  based  on  the 

full  data  and  the  data  without  case  i,  respectively,  and  p  is  the 

dimension  of  0.  A  similar  motivation  can  be  constructed  by  using  other  case 

2  2 

deletion  diagnostics.  For  example,  since  a  is  known,  pD^  ■  (DFFITS) 
from  Belsley,  Kuh  and  welsch  (1980). 

Hie  statistic  D1  in  (2)  can  be  usefully  viewed  as  a  basis  for 
detecting  cases  that  should  be  carefully  inspected  for  gross  errors.  The 
finding  of  a  gross  error  must  necessarily  force  the  removal  or  correction  of 
the  corresponding  case,  and  such  actions  may  cause  a  substantial  change  in  the 
results  of  an  analysis  if  is  large. 

Generally,  case  deletion  diagnostics  allow  for  only  one  of  two  possibili¬ 
ties  *  a  case  is  either  as  specified  by  the  model  or  totally  unreliable 
(variance  ♦  •).  Other  reasonable  and  equally  important  concerns  are  not 

reflected  by  such  diagnostics.  For  example,  we  might  postulate  a  model  with 

2 

constant  variance  but  admit  that  the  true  variances  could  range  between  o  /2 
2 

and  2o  ,  a  level  of  heteroscedasticity  that  will  often  go  undetected  in 
practice.  To  investigate  this  specific  concern,  we  use  the  following  slightly 
more  general  version  of  D^, 


(3) 


D.  (u)  -  IT  -  Y  l2/po2 

i  0) 

A 

where  Y  is  the  vector  of  fitted  values  obtained  when  the  i-th  case  has 
u 

weight  at  and  the  remaining  cases  have  weight  1.  Of  course,  as  u>  ♦  0, 

var(e^)  *  "  and  ■  D^(0).  If  D^w)  is  large  then  the  stipulation  that 

2  2 

the  i-th  case  has  variance  a  /m  rather  than  o  will  lead  to  substantial 
changes  in  the  results  of  the  analysis. 

At  first  glance  it  might  seem  that  Dj.  and  D^w)  would  always  give 
essentially  the  same  information.  This  does  not  seem  to  be  the  case,  however. 
Figure  1  gives  plots  of  pD^(o>)  versus  at  for  two  possible  cases  A  and 
B  from  model  (1).  The  details  behind  Figure  1  will  be  presented  later.  For 
now  we  note  that  the  analysis  is  clearly  more  sensitive  to  alterations  in  the 
weight  attached  to  case  B  since  Db(oj)  -  D^Cto)  >  0  and  for  some  u  this 
difference  is  substantial.  We  must  have  DA( 1 )  “  db(1),  of  course.  However, 
the  fact  that  nft(0)  -  Dp(0)  means  that  the  two  cases  will  be  judged  to  be 
equally  influential  when  using  D^.  It  seems  clear  that  case  deletion 
diagnostics  alone  are  not  sufficient  to  handle  concerns  other  than  gross 
errors.  In  particular,  for  a  more  complete  understanding  of  the  influence  of 
a  single  case  it  is  necessary  to  investigate  the  behavior  of  0^(u)  at  values 
of  u>  other  than  u>  *  0. 

In  the  next  section  we  extend  these  ideas  to  general  models  in  which  to 
can  be  used  to  perturb  model  components  other  than  case  weights.  This  exten¬ 
sion  is  based  on  the  following  relationship  between  D^(u>)  and  the  log  like¬ 
lihood  L(B)  for  model  (1), 

pDi(u)  -  [IT  -  Y^l 2  -  IT  -  Yl2]/02 


A  A 

where  B  ”  B^  1  and 
i-th  case  has  weight 


0) 

(0. 


(4) 


-  2IL(B)  -  L(B  ) ] 

(i) 

is  the  maximum  likelihood  estimator  of  B  when  the 
This  relationship  was  pointed  out  in  the  special 
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case  (j>  «  0  by  Cook  and  Heisberg  (1982,  Chapter  5) 


2.2  Development 

For  a  given  statistical  problem,  let  6  denote  the  p  *  1  vector  of 
unknown  parameters  and  let  L(8)  denote  the  log  likelihood  corresponding  to 
the  postulated  model.  He  introduce  perturbations  into  the  model  through  the 
q  x  1  vector  m  which  is  restricted  to  some  open  subset  ft  of  Rq. 
Generally,  m  can  reflect  any  well-defined  perturbation  scheme  and  thus  is 
not  restricted  to  be  a  collection  of  case  weights.  For  example,  «  might  be 
used  to  induce  a  minor  modification  of  the  explanatory  variables  in  a  general¬ 
ised  linear  model,  or  to  perturb  the  entire  covariance  matrix  of  the  errors  in 
a  normal  linear  model.  As  illustrated  in  the  examples  of  section  4,  «•  must 

be  chosen  carefully  so  that  the  application  is  sensible.  For  now  we  assume 
this  choice  to  have  been  made. 

Let  L(0 |m)  denote  the  log  likelihood  corresponding  to  the  perturbed 
model  for  a  given  «  in  Q.  He  assume  that  there  is  a  unique  «Q  in  (2 

A  A 

such  that  L(8)  -  L(0|«0)  for  all  0.  Finally,  let  0  and  0^  denote  the 

A 

maximum  likelihood  estimators  under  L(8)  and  L(0|w),  respectively,  and 

T  T 

assume  that  L(0|«)  is  continuous  and  twice  differentiable  in  (0  ,«•  >. 

To  assess  the  influence  of  varying  M  throughout  fl,  we  initially 
consider  the  likelihood  difference 

A  A 

LD(M)  -  2[L(0)  -  L(0w>]  .  (5) 

A 

In  a  particular  problem,  specific  characteristics  of  {0^1*  8  £1}  might  be 
relevant,  but  LD(m)  is  a  useful  universally  applicable  feature  that  can  be 
interpreted  in  terms  of  the  large  sample  confidence  region  for  0  (Cox  and 
Hinkley,  1974,  Chapter  9) 

{0|2[L(0>  -  L( 0) ]  <  Xq<P>)  • 


2 

Here,  XQ(P)  is  the  upper  a  probability  point  of  a  chi-squared  distribution 
with  p  degrees  of  freedom.  The  motivation  for  (5)  comes  largely  from  (4), 
but  some  alternatives  will  be  discussed  later.  For  further  discussion  see 
Cook  and  Weisberg  (1982,  Chapter  5)  and  Pregibon  (1981). 

From  this  perspective,  a  graph  of  !£(«•)  versus  (•  contains  essential 
information  on  the  influence  of  the  perturbation  scheme  in  question.  It  is 
useful  to  view  this  graph  as  the  geometric  surface 

«<*>  “  (LD*“>)  •  (6) 

In  differential  geometry  a  surface  of  this  form  is  frequently  called  a  Monge 
patch.  We  will  refer  to  ct(m)  as  an  Influence  graph  since  it  is  the  graph  of 
LD(w)  that  displays  the  influence  of  the  perturbation  scheme.  In  retrospect. 
Figure  1  displays  two  possible  influence  graphs  for  the  scheme  in  which  the 
weight  attached  to  a  single  case  in  linear  regression  is  varied. 

The  rationale  that  led  to  the  Influence  graph  a(w)  is  not  the  only 
reasonable  approach,  of  course.  Suppob^  that  we  partition  0T  *  (8^,8^) , 
where  8^  is  p^  x  1 ,  and  agree  that  only  8 1  if  of  interest.  In  this 
situation  the  analog  of  (6)  is 

LD.(*») 

■  (  m  )  <7> 

where 

A  A  A 

uyw)  -  2[l(8)  -  LC»1{1).g(»1w>>]  » 

g(81>  is  the  function  that  maximizes  for  each  fixed  8^  and 

®  AfV|  A  n  Ab 

8.  is  determined  from  the  partition  0A  -  (8.  ,0;  ).  The  motivation  behind 
1u  •  u  lot  2(U 

(7)  comes  in  part  from  the  large  sample  confidence  region  for  6  ,  (Cox  and 
Hinkley,  1974,  Chapter  9) 

{8^211.(8)  -  L(81,g(81))]  <  • 


The  influence  graph  defined  in  (7)  reflects  a  special  interest.  On  the 


other  hand,  a  somewhat  different  but  related  perspective  leads  to  the 
influence  graph 


a'  («•) 


-  ( 


LD' (U) 


(8 


where 

A  A 

LD*(«)  -  2[L(0j«)  -  L(  0  |«»)  ]  . 

In  the  construction  of  this  graph,  the  moving  frame  of  reference  L(0|«)  is 

A  A 

used  to  compare  0^  and  0,  while  a(M)  was  constructed  by  using  the  fixed 
frame  of  reference  1,(0)  for  the  same  comparison.  Both  a  and  a'  may  be 
useful  for  assessing  influence. 

Ideally,  we  would  like  a  complete  influence  graph,  such  as  those 
displayed  in  Figure  1,  to  assess  influence  in  a  particular  problem.  Clearly, 
this  is  possible  in  only  the  simplest  situations  so  that  it  becomes  necessary 
to  consider  other  methods  for  extracting  the  information  contained  in  an 
influence  graph.  Global  measures  of  influence,  which  characterize  the 
behavior  of  an  influence  graph  over  all  m  in  ft,  are  generally  much  more 

difficult  to  construct  in  practice  than  local  measures  which  characterize 

* 

behavior  in  a  neighborhood  of  a  selected  m,  say  «•  . 

In  normal  linear  regression,  the  various  influence  diagnostics  that  rely 
on  case  deletion  (D^  for  example)  can  be  regarded  as  local  measures  since 
they  are  designed  to  measure  influence  on  various  "corners"  of  ft  *  ( 0 , 1 ) n , 
where  n  is  the  sample  size.  However,  from  Figure  1  and  the  discussion  of 

section  2.1,  it  is  clear  that  the  behavior  of  an  influence  graph  around 

* 

**  m  *»g  *  1  may  be  as  relevant  as  the  behavior  at  the  corners  of  ft. 

In  the  next  section  we  suggest  a  local  measure  of  influence  for  charac- 

* 

terizing  the  behavior  of  an  influence  graph  around  m  =  “q* 


3.  LOCAL  INFLUENCE 


The  behavior  of  an  influence  graph  around  is  accurately  reflected  by 

the  geometric  normal  curvature  at  For  q  «  1  this  curvature  can  be 

viewed  as  the  inverse  of  the  radius  of  the  circle  which  best  approximates  an 
influence  graph  at  or  as  the  rate  of  change  of  the  angle  between  the 

tangent  vector  at  and  the  horizontal  axis.  This  curvature  easily  dis¬ 

tinguishes  between  the  two  influence  graphs  shown  in  Figure  1:  the  curvature 
is  2(.05)^pD^  for  case  A  and  2(.99)2pD^  for  case  B. 

In  this  section  we  use  normal  curvatures  to  characterize  the  behavior  of 
an  influence  graph  around  wQ.  The  normal  curvature  of  a  surface  (a(w)  in 
this  application)  should  be  discussed  in  any  first  text  on  differential  geome¬ 
try.  Sufficient  background  information  is  available  in  Bates  and  Matts 
(1980).  For  convenience  we  use  a(m)  as  defined  in  (6)  to  develop  normal 
curvatures.  The  other  types  of  influence  graphs  discussed  in  section  2.2  will 

be  compared  later  in  this  section.  Also  we  will  Initially  develop  the  normal 

* 

curvatures  at  an  arbitrary  m  ,  although  our  primary  interest  is  in  the  case 
m  -  «|q.  Curvatures  at  points  *ther  than  mQ  may  be  of  some  value  in 
assessing  the  global  behavior  of  an  influence  graph. 

3. 1  Curvatures  for  a(w) 

* 

For  q  >  1  consider  a  straight  line  in  (2  passing  through  w  .  Such  a 
line  can  be  represented  by 

* 

m(a)  «  as  +  at  (9) 

where  a  8  R*  and  £  is  a  fixed  nonzero  vector  in  Rq.  This  line  generates 

* 

a  lifted  line  on  the  influence  graph  a(w)  passing  through  a(»  ).  Each 
direction  £  specifies  such  a  lifted  line  and  for  each  lifted  line  we  can 
imagine  a  normal  curvature  as  discussed  in  connection  with  Figure  1.  For  a 
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■  v  -¥->  * ywy%?wywyw?*x*5^r 


given  direction  £,  let  C  (••  )  denote  the  normal  curvature  of  a(tt)  at 

x 


Let  V  denote  the  (q+1)  x  q  matrix  with  elements  3cr  (<i»)/3o»^ ,  i  * 

1#2,...,q+1,  j  ■  1,2,...,q.  Here  is  the  i-th  component  of  o  and  all 

* 

derivatives  are  evaluated  at  a  .  Further,  let  Wj ^  denote  the  (q+1)  *  1 

2 

vector  with  elements  3  ci^ (ee)/3u» j3u>^#  i  *  1,2,..., q+1.  Then  the  velocity  and 

acceleration  vectors  in  the  direction  £  are  respectively 

a,  -  Vi  (10] 

4 


a  -  T  T  w..  £. fu  (ID 

*  j  k  3M 

where  £  -  (A  ).  The  normal  curvature  C#(m  )  can  now  be  written  as 

Ctlm*)  -  »P^«fl/litl2  (12) 

where  is  the  projection  operator  for  the  null  space  of  V.  Carrying  out 

the  operations  indicated  in  (12)  we  find 

V*> - .I'jft'  -T- 

1  (1+IFI  )'2l  (I+FF  )£ 

®  A 

where  F  is  the  q  x  1  vector  with  elements  23l(0(ij)/3(i)^ ,  j  ■  1,2,...,q, 

••  2  A 

and  F  is  the  q  x  q  matrix  with  elements  3  L(0  )/3u>,3u>. ,  j,k  *  1,2,...,q. 

a>  K  j 


Since  F  -  0  at  <•  -  *•„, 


ct<V 


iO»L 


This  simple  form  appears  since  *0  is  a  global  minimum  and  thus  the  velocity 
and  acceleration  vectors  are  orthogonal;  that  is,  every  acceleration  vector  is 
orthogonal  to  the  tangent  plane  at  wQ.  Unless  indicated  otherwise,  we  take 
m  ■  s.  in  the  remainder  of  this  section. 


For  (14)  to  be  useful  we  should  have  a  straightforward  way  to  evaluate 


F.  Using  the  chain  rule  for  differentiation,  it  is  not  difficult  to  verify 
that 

••  ee 

F  -  JT  L  J  M5) 

where  -  L  is  the  observed  information  for  the  postulated  model  («i  =  »  ) 

A 

and  J  is  the  p  x  q  matrix  with  elements  30^/30)^,  i  *  1,2,...,p,  j  = 

*  A 

where  0  is  the  i-th  component  of  9^.  Next,  to  evaluate  J 
we  use  the  fact  that 


for  j  “  1, 2, . . .  ,p 
with  respect  to  «• 


3li(9|«)  | 

**1  'e»e 

J  u» 

and  all  «  in  0.  Differentiating  both  sides  of 
and  evaluating  at  it  follows  that 

a  -  -(*£>-1a 


(16) 


(16) 


(17) 


where  A  is  the  p  x  q  matrix  with  elements 


A  -  a2U0l«) 

ij  39^1(0^ 

A 

evaluated  at  9  -  •  and  m  »  i  «  1,2,...,p,  j  ”  1,2,...,q.  Substituting 
(17)  into  (15)  we  obtain 

¥-AT(*£)_1A  (18) 


and  therefore 


2ftrAT(*£)_1A*|/iTi 


(19) 


The  individual  coogxments  of  (19)  are  usually  straightforward  to  obtain  once 
the  perturbation  scheme  has  been  defined. 

There  are  several  obvious  ways  in  which  (19)  might  be  used  to  study 
a(w)  in  practice.  The  extremes  Cmax  -  max^C^  and  Cmin  »  min^C^  are  two 
useful  options.  Of  course,  Cmax  and  Cmin  correspond  to  the  maximum  and 
minimum  absolute  eigenvalues  of  F  in  (18).  The  eigenvectors  associated  with 
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these  eigenvalues  can  be  used  to  set  the  directions  in  (9)  which  can  then  be 
used  to  construct  plots,  similar  to  those  in  Figure  1,  of  the  lifted  line 


a(tt(a)).  Similarly,  the  eigenvectors  associated  with  intermediate  eigenvalues 
can  be  used  to  investigate  the  behavior  of  ot(«)  in  directions  corresponding 
to  less  extreme  curvatures. 


Another  option  is  the  average  curvature  C  obtained  by  averaging  (19) 
with  respect  to  a  uniform  distribution  on  the  surface  of  the  unit  sphere  in 
q  dimensions:  Let  Sg  denote  the  surface  area  of  a  q-dimensional  unit 
sphere  and  assume  that  F  is  negative  semidefinite.  Then 

C  =  S_1  I  C  dS 

*  Wl  *  (20) 
=  2(q(q+2))_1[tr2(F)  +  2tr(F2)] 

•  • 

where  F  is  as  defined  in  (18). 

Finally,  a  relationship  between  the  curvature  and  LD  can  be 

obtained  by  expanding  LD(M(a ) )  =  LD(i»o  +  at)  as  a  function  of  a: 

2  2 
LD(«0  +  at)  ■  a  C^/2  +  o(a  ) 

where  Itl  ■  1.  This  representation  provides  a  useful  alternative  interpreta¬ 
tion  of  C. . 

i 


3.2  Other  Influence  Graphs 


In  this  subsection  we  investigate  the  influence  graphs  a^(u)  and 
a'(tt)  defined  in  equations  (7)  and  (8),  respectively. 

By  replacing  a  with  a'  in  the  development  that  led  to  (10)  and  (11) 
and  using  the  chain  rule  for  differentiation,  it  is  not  difficult  to  verify 
that  the  velocity  and  acceleration  vectors  at  for  a'  are  the  same  as 

those  for  a.  It  follows  that  a  and  a*  have  identical  curvatures  at  «Q, 
although  the  two  influence  graphs  can  differe  considerably  in  global  behavior. 
Since  we  are  primarily  interested  in  assessing  local  influence  around  n^,  a 


•nd  the  analogous  graph  o^  for  subsets  will  be  used  in  the  remainder  of  this 

paper. 

TO  develop  the  curvatures  for  we  first  note  that  the  development 

*  A  _ 

leading  to  (13)  is  valid  with  L(0^)  replaced  by  L[y(8^)]  where  Y  = 

*  gj  a 

and  g  is  defined  following  (7).  It  follows  that  (13)  can  be 

adapted  for  by  replacing  F  and  F  by  G  «*  23l(y)/3»  and  G  * 

2  2  • 

3  L(Y )/3«  ,  respectively.  Since  G  =  0  at  (14)  is  also  valid  with  F 

replaced  by  G.  To  find  a  useful  expression  for  G,  we  again  use  the  chain 


rule  and  obtain 


g  -  ri  k 


where  L  is  as  defined  following  (15)  and  K  is  the  p  x  g  matrix  with 

A 

elements  3y^(0^w)/3m.  ,  i  “  1,2,...,p,  j  *  1,2,...,q,  evaluated  at  mQ. 

We  next  need  to  find  a  useful  representation  for  K-  Let  denote 

A 

the  Pf  x  q  matrix  30^/3*  and  let  denote  the  p2  x  p1  matrix 

A 

3g(*1)/3*1  evaluated  at  *  •1*  Then 


Mote  that  is  just  the  matrix  consisting  of  the  first  p^  rows  of  J 

defined  in  (17).  To  evaluate  K2  we  make  use  of  the  fact  that 

1^-  L[«1 ,g(*1 )]  ■  0  for  all  ©1  (2 

where  gA  is  the  i-th  component  of  g,  and  the  derivative  is  evaluated  at 
g  »  g(9,|),  i  ■  1,2,...,p2.  Differentiating  (23)  with  respect  to  0^  and 

A 

evaluating  at  0^  we  find 

*2  *  -(L22)-1l21  <2 
where  and  are  determined  from  the  partition 

v  Li2x 

(2 


ZvVvV 

'«  '»  %  %  V 
«  ■  • 


Finally,  combining  (14),  (21)  and  (24)  with  the  form  of  K1  mentioned 


above,  we  obtain  the  normal  curvature  for  subsets, 

,  T  t  —i  .  t 

C,-2|£4T(L  -B22)4t|/»£ 


where 


(2 


The  techniques  discussed  at  the  end  of  subsection  3.1  are  applicable  to  (26) 
of  course. 
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4.  APPLICATIONS 


In  this  section  we  describe  several  possible  applications  of  the  ideas 
described  in  the  previous  sections.  Our  intent  is  to  illustrate  the  ranqe  of 
possible  use  rather  than  to  develop  any  particular  application  in  full  detail. 
The  refinements  of  the  individual  applications  and  adaptations  for  applica¬ 
tions  not  discussed  here  should  be  straightforward. 


4.1  Case  Weights  in  Normal  Linear  Regression 


Let  m  denote  the  n  x  1  vector  of  case  weights  for  the  regression 


model  (1)  and  again  assume  that  a  is  known.  The  modified  log  likelihood  is 


L(6|«)  -  -  £  “i<Yi  "  xi*)2 


20  i-1 


where  u»^  and  are  the  i-th  components  of  and  T,  respectively. 


and  is  the  i-th  row  of  Z.  Differentiating  (27)  with  respect  to  0  and 


m,  and  evaluating  at  0  and  ■  1,  we  find 


T  2 

A  -  X  D(a)/o 


where  •  -  (e^ )  is  the  n-vector  of  ordinary  residuals  when  m  *  1  and 


T  .  2 


D(m)  -  diag(eir . . . ,eR) .  Since  L(0)  -  -Z  X/o  , 


C%  -  2|iTAT('i)'1A  i|/*Ti 


2  tTD(m)PxD(e)i/tTl  o2 


where  P  ■  Z( ZTZ) - 1 XT  is  the  projection  operator  for  the  column  space  of  Z. 


2  T  T  2 

When  a  is  unknown,  a  similar  calculation  for  9  -  (0  ,o  )  yields 


A  -  (*^?i/02) 
•2/2 0* 


*2  2  2 
where  o  is  the  maximum  likelihood  estimator  of  o  and  e^  =  (e“).  Since 


_T_ 2 


«•>  -  -(*  v°  °  j 

0  n/a* 


(31) 


we  have  the  analogous  result  for  0, 

T  T  “2  T  *  2 

CA  -  2t  (D(e)PxD(«)  +  e2e2/2no  ]t/ft  to 

If  only  0  is  of  interest,  the  above  results  in  combination  with  (26) 

2  *2 

show  that  the  curvature  is  given  by  (29)  with  o  replaced  with  o  .  The 
following  three  special  cases  should  furnish  some  insight  into  the  behavior  of 
the  curvature  in  the  situation. 

First,  for  a  simple  random  sample  F  has  only  one  nonzero  eigenvalue 
Cmax  corresponding  eigenvector  *max  “  «/*•*•  Thus,  the  local  changes 

A 

in  $  will  be  zero  when  -  1  is  perturbed  in  any  direction  ft'  that  is 

A  A 

orthogonal  to  e.  nils  is  easily  confirmed  by  direct  calculation:  0^  *  8 

for  «•  ■  1  +  at'.  In  this  simple  situation  the  maximum  curvature  is  Cmay  «  2 

which  is  independent  of  the  data.  For  this  reason  a  curvature  of  2  serves 

as  a  useful  general  reference.  Experience  has  shown  that  curvatures  smaller 

than  2  can  generally  be  neglected  while  curvatures  much  larger  than  2 

suggest  further  investigation.  Perturbations  of  the  case  weights  in  a  simple 

random  sample  can  therefore  never  result  in  serious  local  changes,  although 

global  changes  resulting  from  gross  errors  can  be  serious,  of  course.  It  is 

well-known  that  a  gross  error  is  indicated  by  a  relatively  large  element  of 

*max*  *mP°rtant  implication  of  this  is  that  even  if  Cmav  is 

small  an  inspection  of  ft  may  reveal  the  presence  of  gross  errors.  This 

max 

idea  will  be  illustrated  further  in  later  examples. 

Second,  the  curvature  for  simple  linear  regression  through  the  origin 


with  X  -  (x^)  is 


.2*2-.  2 
2Z(xiei)  /a  Zxi 


which  occurs  in  the  direction  ft  *  (x^e^).  This  curvature  is  bounded  above 


by  2n  and  will  tend  to  be  large  when  the  residuals  attached  to  remote 


x^'s  are  relatively  large. 


Finally,  the  curvature  for  the  influence  graph  obtained  by  modifying  the 

weight  attached  to  a  single  case,  say  the  i-th,  is 

Ct  -  2e^hii/o2  -  2p(1-hii)2Di  (32) 

where  h^j  is  the  (i,j)-th  element  of  Px.  Form  (32)  was  used  to  construct 

Figure  1.  For  case  A,  h^j  -  >95  and  for  case  B  h,^  =  .01.  Thus,  case  A 

corresponds  to  a  high  leverage  point  with  a  relatively  small  residual  while  B 

corresponds  to  a  low  leverage  point  with  a  large  residual.  In  this  example, 

* 

perturbing  the  weight  attached  to  case  B  would  lead  to  changes  in  0  that  are 

uniformly  larger  than  those  obtained  when  the  weight  attached  to  case  A  is 

similarly  modified,  although  the  two  cases  would  appear  equally  influential 

when  deleted.  Generally,  high  leverage  points  with  relatively  small  residuals 

are  influential  only  when  considering  the  possibility  of  a  gross  error  so  that 

the  case  contains  no  relevant  information  about  0.  In  the  example  of  Figure 

1,  the  variance  of  case  A  might  be  set  at  10  times  the  variances  of  the 

resminlng  cases  without  any  serious  consequences  while  a  similar  modification 

of  the  variance  of  case  B  could  lead  to  substantial  changes. 

We  use  the  drill  data  as  given  in  Cook  and  Weisberg  (1982,  p.  148)  for  a 

2  “2 

numerical  illustration  of  the  use  of  (29)  with  o  -  a  .  The  data  consist 

of  n  *  31  observations  on  the  axial  load  on  a  drill  bit  under  condition  set 

by  three  design  variables.  We  use  the  full  second-order  response  surface 

model  so  that  there  are  10  location  parameters. 

The  maximum  curvature  CWIJ1V  ■  7.42  occurs  in  approximately  the  direction 

f  -  (iA)  with  *5  -  .61,  t6  -  -.16,  ig  -  -1,  Jt26  -  .38,  l28  -  -.15,  *31  -  .21 

and  *  0  otherwise.  Five  of  the  six  cases  with  nonnegligible  'a 

correspond  to  the  cases  with  the  five  largest  ordinary  residuals.  Using  t 

* 

as  given  here  and  (9)  with  m  ■  1,  we  have  displayed  LD(M(a))  in  the 
direction  of  the  maximum  curvature  in  Figure  2.  Clearly,  appropriately 


modifying  a  few  selected  weights  can  substantially  change  0  as  measured 


using  L.  For  example,  at  a  ■  .9  the  approximate  weights  are  “5  =  K55' 

U3.  -  .86,  oj_  *  .1,  o>  *  1.34,  a)  =  .86,  u>  =  1.19,  and  ti)  =  1  other- 
6  9  26  28  31  x 

A 

wise.  For  these  weights  L D(m)  =  18.7  so  that  0  will  lie  on  the  edge  of  a 

01 

96%  confidence  for  0. 

Further  information  on  a(m)  could  be  obtained  by  looking  in  directions 
that  correspond  to  smaller  nonzero  eigenvalues  of  F.  Since  Px  has  rank 
p  there  will  be  at  most  p  such  directions.  This  serves  as  a  reminder  that 
the  sensitivity  of  an  analysis  to  case  weight  perturbations  can  be  expected  to 
increase  with  p  for  fixed  n. 


4.2  Correlations  in  normal  Linear  Regression 

In  linear  regression  the  assumption  of  uncorrelated  errors  is  often 
difficult  to  justify.  In  such  situations  it  may  be  important  to  ask  if  the 
analysis  is  sensitive  to  deviations  from  this  assumption. 

Let  •»  now  denote  an  n(n-1)/2  x  1  vector  of  error  correlations  indexed 


by  (i,j),  i  <  j,  and  let  E^  -  var(e). 

2  2 
w^o  for  i  <  j  and  a  for  i  «  j. 

known. 


The  (i,j)-th  element  of  E  is 

2 

For  convenience  we  assume  o  to  be 


The  log  likelihood  for  the  perturbed  model  can  now  be  written  as 

L<0|«)  -  -  \  logjEj  -  ~  (Y-W)V\y-I0)  .  (33) 

A 

Differentiating  (33)  with  respect  to  0  and  m^ ,  and  evaluating  at  0  and 

•  0,  it  is  not  difficult  to  verify  that  the  (i,j)-th  column  of  A  is 
2  T 

-  +  Xj  e^ )/o  where  x^  is  the  i-th  row  of  X.  It  follows  that  the 

(  d»  j  )t  (k,m)  )-th  element  of  F  is 

hki®mej  +  ^•k^j  +  hkjemei  +  K)ekBi  <34> 

where  h^j  is  defined  following  (32). 


In  this  application  V  is  an  n(n-1)/2  x  n(n-1)/2  matrix.  The  eigen- 

•  •  T  « 1 

values  of  F  can  be  determined  by  replacing  (Z  Z)  with 

T  -V,  T  T 

(XX)  2 (Z  Z)  2  and  using  the  fact  that  the  eigenvalues  of  A  A  are  the 

same  as  those  of  AA  which  will  be  a  manageable  p  x  p  matrix  in  this 

case.  The  eigenvectors  of  F  may  be  more  of  a  problem  but  we  expect  that  by 

using  the  structure  of  the  experiment  ••  can  be  restricted  to  a  subspace  in 

many  applications. 

When  only  a  single  correlation  is  considered,  F  becomes  a  scalar  and 
the  corresponding  curvature  is 

2(hiiej  +  a»ij«iej  +  hjjei)/a2  (35> 

where  (i,j)  indexes  the  perturbed  correlation. 


4.3  Explanatory  Variables  in  Normal  Linear  Regression 

It  is  well  known  that  perturbations,  within  the  limits  of  measurement 
error,  of  the  explanatory  variables  in  linear  regression  can  seriously 
influence  the  results  of  a  least  squares  analysis,  particularly  when 
collinearity  is  present.  To  handle  this  situation  in  the  present  context, 
let  Sj  denote  the  standard  deviation  of  the  measurement  error  associated 

2 

with  the  j-th  explanatory  variable.  For  convenience  we  again  assume  that  a 

is  known.  The  following  results  can  be  easily  adapted  for  the  situation  in 

2  2  *  2 
which  o  is  unknown  and  only  0  is  of  interest  by  replacing  a  with  a  . 

The  perturbed  log  likelihood  L<  0 1  m)  is  constructed  from  ( 1 )  with  Z 

replaced  by 

Z  -  X  +  W  S  (36) 

(i) 

where  W  -  (u>  ^ )  is  an  n  x  p  matrix  of  perturbations  and  S  - 


(37) 


,  T  t  JP.  .  2 

\  “  W  -  Pk*  )/0 

where  is  the  k-th  standard  basis  vector  for  R^. 

In  this  application,  F  is  a  potentially  large  np  x  np  matrix  and 
determining  the  eigenvalues  of  F  may  be  an  unpleasant  task.  However,  using 
the  method  described  following  (34),  it  can  be  shown  that  the  nonzero  eigen¬ 
values  of  F  are 


T  .  ,  2  .  *2  2,2 

•  efi./o  +  l  B.s  /o 
j  D  3 


where  is  the  i-th  eigenvalue  of  S(XTX)-1S,  i  *  1,2,...,p.  Thus, 


_  T  ,  2  “2  2,2 

°«x  •  2*  *  5»x/o  *  2\  W 


(38) 


(39) 


When  only  the  k-th  column  of  X  is  perturbed,  Sj  =  0  for  j  ?  k  and 
(39)  can  be  written  as 


Cmax  *  2s£(«Te/RSSk  +  $k)/o2  (40) 

where  RSS^  is  the  residual  sum  of  squares  from  the  regression  of  the  k-th 
column  of  X  on  the  remaining  columns. 

For  a  first  numerical  illustration  we  use  the  perturbation  scheme  for  the 
bongley  data  that  is  described  in  Weisberg  (1980,  p.  70-72).  For  this  setup, 

which  consists  essentially  of  using  the  s^'s  to  represent  round-off  errors 

2  “2 

in  the  last  digit  of  the  explanatory  variables,  evaluating  (39)  with  a  -  a 
gives  Cmax  -  .18.  Weisberg  found  that  only  one  significant  digit  in  the 

A 

0's  would  be  stable  under  his  perturbation  scheme.  However,  the  small 
sutximum  curvature  indicates  that  such  variation  does  not  reflect  important 
changes  in  the  estimates  when  judged  against  the  log  likelihood. 

For  a  second  numerical  illustration  we  use  the  rat  data  from  Weisberg 
(1980,  p.  110-113).  This  data  set  consists  of  19  cases  and  4  explanatory 
variables,  Xq  *  constant,  ■  body  weight,  X2  ”  liver  weight  and  X3  * 
relative  dose.  The  perturbation  schemes  we  consider  are  characterized  by  8  » 
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diag( sQ ,s1 ,s2 ,s3 )  =  diag( 0 , 1 ,0 , s3 ) .  For  s3  -  .01(. 01). 04  the  maximum 

2  *2 

curvatures  obtained  by  setting  o  =  a  in  (39),  are  C__„  =  2.8,  9.4,  20.5 

RlclX 

and  36.0,  respectively.  The  curvature  for  s3  =  .01  is  relatively  small 
while  the  curvatures  for  the  remaining  s3's  indicate  that  a  minor  perturba¬ 
tion  within  the  limits  of  these  measurement  errors  may  lead  to  drastic 

A 

changes  in  $.  At  the  very  least,  further  investigation  is  indicated. 

For  example,  a  plot  of  LD(u(a))  in  the  direction  of  the  eigenvector 
corresponding  to  is  given  in  Figure  3  for  s3  =  .03.  Interestingly, 

the  largest  element  in  the  eigenvector  for  Cmav,  always  corresponds  to  the 
relative  dose  for  case  3  which  is  the  anomolous  care  identified  by  Weisberg 
( 1980 ) .  The  scale  on  the  x-axis  in  Figure  3  is  the  amount  that  the  relative 
dose  for  case  3  is  perturbed  in  units  of  s3«  Thus,  for  example, 
a  *  .5  indicates  as3  -  .015  was  added  to  the  relative  dose  for  case  3. 
Clearly,  the  influence  of  perturbations  for  s3  ■*  .03  is  very  strong.  In 

A 

particular,  the  value  of  LD  at  a  *  -.5  shows  that  will  be  moved 

A 

outside  of  a  95%  confidence  region  for  0  when  perturbing  each  element  of 
X  by  an  amount  that  is  no  greater  than  1/2  of  the  respective  standard 
deviations  Sj.  The  nonmonotonic  behavior  in  Figure  3  arises  since  globally 
the  lifted  line  a(m(a) )  need  not  correspond  to  a  path  of  monotonic  descent 
Kelly  (1984)  investigates  various  extensions  of  for  use  in  the 

errors  in  variables  problems. 

4.4  Case  Weights  in  Curved  Exponential  Families 

In  this  and  the  following  subsection,  we  indicate  how  previous  results 
can  be  extended  beyond  normal  linear  models. 


Let  y^  denote  an  observation  from  a  regular  curved  exponential  family 
with  minimal  representation 

f(y|0)  -  expfyn^G)  -  <^(^(0))}  • 


12. S 

LD(u  (a) ) 

10.0 


For  a  series  y^,...,yn  of  independent  observations  the  log  likelihood  is 


therefore 

L(6)  -  l  (yin1  -  <l»i(ni))  .  (41) 

Next,  the  log  likelihood  obtained  by  attaching  a  weight  (D^  to  the  i-th  case 
case  be  written  as  simply 

L(6|«)  -  l  “  ♦i(01))  .  (42) 

Pregibon  (1981)  used  a  likelihood  of  the  form  to  derive  various  diagnostics 
for  logistic  regression. 


Let  n  -  (r^)  , 

n  «  3n/38  (n  x  p) 

n±  -  a^ftJ/at2  (p  x  p) 

and 

♦  ■  diag02*1/3ri1)  (n  x  n) 

A 

where  all  derivatives  are  evaluated  at  8,  the  maximum  likelihood  estimator 

of  8.  using  the  results  of  section  3.1  it  is  not  difficult  to  verify  that 

..  •  *T  •  —  1»T 

r  -  ornil  -  n  ♦  n]  n  or  (43) 

where  Dr  is  an  n  x  n  diagonal  matrix  with  the  score  residuals 

rA  -  (yA  -  3*i/»ni)  (44) 


as  the  diagonal  entries. 

Many  generalized  linear  models  are  special  cases  of  (41)  with 
T 

^(6)  ■  K(x^O)  where  X  is  the  link  function.  Further, 

r\  -  diag(K^)X  ' 

and 


••  ••  T 

^  * 

•  •  e 

where  and  are  the  first  and  second  derivatives  of  K  evaluated  at 


25 


Xji,  respectively.  Xn  particular,  W  «  0  when  the  cononical  link  is  used 


4.5  Explanatory  Variables  in  Generalized  Linear  Models 


Consider  the  log  likelihood  (41)  with  =*  K(x^8).  The  log  likelihood 


L(8|«)  obtained  after  the  explanatory  variables  have  been  perturbed  by  an 


T  T 

amount  m  can  be  constructed  by  replacing  x^  with  the  i-th  row  of 


X  defined  in  (36).  From  this  it  can  be  verified  that  A  has  the  same 
(0 


structure  as  described  in  section  4.3  and  that 


\  «  sk{l^rT  diagt^)  +  O^X*  diagtr^  “  V^)} 


where  is  defined  following  (37)  and  r  *  (r^).  Further,  the  observed 


information  matrix  is 


m  T  ••  ..  «2 

-L  -  -X  diagtr^  -  ^K^X  . 


Por  a  concrete  illustration  we  use  the  leukemia  data  as  reported  in  Cook 


and  Weisberg  (1982,  p.  179).  Here,  a  patients  survival  time  in  weeks  yi#  i 


1,2,..., 17,  is  assumed  to  follows  a  one  parameter  exponential  distribution 


with  mean  exptQ^  +  ®2Xi^  vhere  x^  *  log10(WBC^)  and  WBC^  is  the  white 


blood  cell  count  for  the  i-th  patient. 


The  log  likelihood  for  the  original  data  is  of  the  form  given  in  (41) 


ni  “  K<01  +  02xi*  "  “•XPC“(S1  +  ®2xi^  and  *i ^ ni ^  ”  -log(-H1).  Prom 


this  it  follows  that 


■  •*pl-(fl1  ♦  02Xi^  “  1**^) 


M  • 

'  ‘‘I 


•  .-1  *  A  .-1 
■  ~ni  -  exp [6 1  +  ©2xiJ  -  Ki 


■  h“2  -  exp[2[01  +  8^)1  “  var(YA) 


yi  -  exp (0 1  +  02x±] 


and  thus  that 


«.  mm  •  2  • 

'1*1  '  *1*1  ‘  "Vl 


■  ■ .  ■ 


These  calculations  along  with  (45)  and  (46)  can  now  be  used  to  construct  F 
as  given  in  (18). 

To  assess  the  influence  of  measurement  errors  associated  with  WBC  we 
perturb  x  •  log 10 (WBC)  rather  than  WBC  itself.  This  implies  that  the 
measurement  errors  associated  with  WBC  are  multiplicative  rather  than 
additive  and  that  the  standard  deviation  of  WBC^  is  proportional  to 
B(WBCj') .  Both  implications  seem  reasonable. 

The  maximum  curvature  for  this  perturbation  scheme  is  C|Bax  **  17.01 4«| 
where  sx  is  the  standard  deviation  of  the  measurement  error  associated 
with  x  -  log10WBC.  Clearly,  the  measurement  error  must  be  substantial  for 
the  local  Influence  to  be  large.  The  eigenvector  associated  with  <^ux  lies 
substantially  in  the  direction  of  case  17 >  The  largest  element  of  this  vector 
corresponds  to  case  17  and  is  about  7  times  larger  than  the  second  largest 
element.  Thus,  although  the  local  curvature  is  small,  an  inspection  of  the 
direction  of  maximum  curvature  does  direct  attention  to  case  17  which  is  the 
case  that  Cook  and  Weisberg  (1982,  p.  185)  identified  as  influential  by  using 
case  deletion  diagnostics. 


5 .  DISCUSSION 


For  a  complete  understanding  of  the  influence  of  a  particular  perturba¬ 
tion  scheme  it  is  probably  necessary  to  know  the  full  behavior  of  the  selected 
influence  graph.  We  have  found  the  central  methodology  discussed  in  this 
paper  to  be  a  useful  and  relatively  simple  way  of  characterizing  the  local 
behavior  of  an  influence  graph  around  »Q.  The  maximum  curvature  CMX  seems 
to  be  a  reliable  indicator  of  extreme  local  behavior,  and  the  plot  of  the 
corresponding  lifted  line  provides  a  reasonably  easy  way  to  confine  such 
indications.  Also,  the  methodology  can  be  easily  adapted  to  handle  loss 
functions  other  than  LD  or  LD1 .  In  a  Bayesian  analysis,  for  example,  LD 
might  be  replaced  with  a  loss  function  that  reflects  the  sensitivity  of  the 
analysis  to  perturbations  in  the  prior  parameters. 

As  demonstrated  in  Figure  1,  gross  errors  can  have  a  substantial 
influence  on  an  analysis  even  when  the  curvatures  are  small.  To  understand 
the  consequences  of  gross  errors  it  is  necessary  to  characterize  the  behavior 
of  an  influence  graph  near  the  boundaries  of  £2,  as  in  case  deletion  diag¬ 
nostics  attempt  to  do.  Generally,  this  might  be  done  by  simply  evaluating  an 
influence  graph  at  various  points  near  the  boundary  of  (2.  However,  our 
experience  has  shown  that  a  plot  of  the  lifted  line  associated  with  CMx  may 
indicate  the  seriousness  of  gross  errors,  even  when  Cmax  is  small.  This 
happens,  for  example,  in  the  numerical  illustrations  of  sections  4.3  and  4.5, 
and  may  be  expected  whenever  the  influence  graph  is  strongly  quadratic. 
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